Search CORE

33 research outputs found

Linking the Resource Description Framework to cheminformatics and proteochemometrics

Author: Alvarsson Jonathan
Andersson Annsofie
Eklund Martin
Lampa Samuel
Lapins Maris
Spjuth Ola
Wikberg Jarl ES
Willighagen Egon L
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Semantic web technologies are finding their way into the life sciences. Ontologies and semantic markup have already been used for more than a decade in molecular sciences, but have not found widespread use yet. The semantic web technology Resource Description Framework (RDF) and related methods show to be sufficiently versatile to change that situation. Results The work presented here focuses on linking RDF approaches to existing molecular chemometrics fields, including cheminformatics, QSAR modeling and proteochemometrics. Applications are presented that link RDF technologies to methods from statistics and cheminformatics, including data aggregation, visualization, chemical identification, and property prediction. They demonstrate how this can be done using various existing RDF standards and cheminformatics libraries. For example, we show how IC50 and K<it>i</it> values are modeled for a number of biological targets using data from the ChEMBL database. Conclusions We have shown that existing RDF standards can suitably be integrated into existing molecular chemometrics methods. Platforms that unite these technologies, like Bioclipse, makes this even simpler and more transparent. Being able to create and share workflows that integrate data aggregation and analysis (visual and statistical) is beneficial to interoperability and reproducibility. The current work shows that RDF approaches are sufficiently powerful to support molecular chemometrics workflows.</p

Maastricht University Research Portal

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Experiences with workflows for automating data-intensive bioinformatics

Author: Bongcam-Rudlof Erik
Carrasco Hernández Guillermo
Forer Lucas
Giovacchini Mario
Kallio Aleksi
Kanduła Maciej M
Korpelainen Eija
Krachunov Milko
Kreil David P.
Kulev Ognyan
Lampa Samuel
Pireddu Luca
Schönherr Sebastian
Siretskiy Alexey
Spjuth Ola
Valls Guimera Roman
Vassilev Dimitar
Łabaj Pavel P.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

High-throughput technologies, such as next-generation sequencing, have turned molecular biology into a data-intensive discipline, requiring bioinformaticians to use high-performance computing resources and carry out data management and analysis tasks on large scale. Workflow systems can be useful to simplify construction of analysis pipelines that automate tasks, support reproducibility and provide measures for fault-tolerance. However, workflow systems can incur significant development and administration overhead so bioinformatics pipelines are often still built without them. We present the experiences with workflows and workflow systems within the bioinformatics community participating in a series of hackathons and workshops of the EU COST action SeqAhead. The organizations are working on similar problems, but we have addressed them with different strategies and solutions. This fragmentation of efforts is inefficient and leads to redundant and incompatible solutions. Based on our experiences we define a set of recommendations for future systems to enable efficient yet simple bioinformatics workflow construction and execution.Pubblicat

Springer - Publisher Connector

P-arch

PubMed Central

Publikationsserver der Universitätsbibliothek Bodenkultur Wien

Reproducible Data Analysis in Drug Discovery with Scientific Workflows and the Semantic Web

Author: Lampa Samuel
Publication venue: 'Uppsala University'
Publication date: 01/01/2018
Field of study

The pharmaceutical industry is facing a research and development productivity crisis. At the same time we have access to more biological data than ever from recent advancements in high-throughput experimental methods. One suggested explanation for this apparent paradox has been that a crisis in reproducibility has affected also the reliability of datasets providing the basis for drug development. Advanced computing infrastructures can to some extent aid in this situation but also come with their own challenges, including increased technical debt and opaqueness from the many layers of technology required to perform computations and manage data. In this thesis, a number of approaches and methods for dealing with data and computations in early drug discovery in a reproducible way are developed. This has been done while striving for a high level of simplicity in their implementations, to improve understandability of the research done using them. Based on identified problems with existing tools, two workflow tools have been developed with the aim to make writing complex workflows particularly in predictive modelling more agile and flexible. One of the tools is based on the Luigi workflow framework, while the other is written from scratch in the Go language. We have applied these tools on predictive modelling problems in early drug discovery to create reproducible workflows for building predictive models, including for prediction of off-target binding in drug discovery. We have also developed a set of practical tools for working with linked data in a collaborative way, and publishing large-scale datasets in a semantic, machine-readable format on the web. These tools were applied on demonstrator use cases, and used for publishing large-scale chemical data. It is our hope that the developed tools and approaches will contribute towards practical, reproducible and understandable handling of data and computations in early drug discovery

Publikationer från Uppsala Universitet

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Reproducible Data Analysis in Drug Discovery with Scientific Workflows and the Semantic Web

Author: Lampa Samuel
Publication venue: 'Uppsala University'
Publication date: 01/01/2018
Field of study

Digitala Vetenskapliga Arkivet - Academic Archive On-line

SWI-Prolog as a Semantic Web Tool for semantic querying in Bioclipse: Integration and performance benchmarking

Author: Lampa Samuel
Publication venue: Uppsala universitet, Institutionen för farmaceutisk biovetenskap
Publication date: 01/01/2010
Field of study

The huge amounts of data produced in high-throughput techniques in the life sciences and the need for integration of heterogeneous data from disparate sources in new fields such as Systems Biology and translational drug development require better approaches to data integration. The semantic web is anticipated to provide solutions through new formats for knowledge representation and management. Software libraries for semantic web formats are becoming mature, but there exist multiple tools based on foundationally different technologies. SWI-Prolog, a tool with semantic web support, was integrated into the Bioclipse bio- and cheminformatics workbench software and evaluated in terms of performance against non Prolog-based semantic web tools in Bioclipse, Jena and Pellet, for querying a data set consisting of mostly numerical, NMR shift values, in the semantic web format RDF. The integration has given access to the convenience of the Prolog language for working with semantic data and defining data management workflows in Bioclipse. The performance comparison shows that SWI-Prolog is superior in terms of performance over Jena and Pellet for this specific dataset and suggests Prolog-based tools as interesting for further evaluations

Publikationer från Uppsala Universitet

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Reproducible Data Analysis in Drug Discovery with Scientific Workflows and the Semantic Web

Author: Lampa Samuel
Publication venue: Uppsala
Publication date
Field of study

Towards agile large-scale predictive modelling in drug discovery with flow-based programming design principles

Author: Alvarsson Jonathan
Lampa Samuel
Spjuth Ola
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Predictive modelling in drug discovery is challenging to automate as it often contains multiple analysis steps and might involve cross-validation and parameter tuning that create complex dependencies between tasks. With large-scale data or when using computationally demanding modelling methods, e-infrastructures such as high-performance or cloud computing are required, adding to the existing challenges of fault-tolerant automation. Workflow management systems can aid in many of these challenges, but the currently available systems are lacking in the functionality needed to enable agile and flexible predictive modelling. We here present an approach inspired by elements of the flow-based programming paradigm, implemented as an extension of the Luigi system which we name SciLuigi. We also discuss the experiences from using the approach when modelling a large set of biochemical interactions using a shared computer cluster

Springer - Publisher Connector

Publikationer från Uppsala Universitet

PubMed Central

Digitala Vetenskapliga Arkivet - Academic Archive On-line

SciPipe - Turning Scientific Workflows into Computer Programs

Author: Alvarsson Jonathan
Dahlö Martin
Lampa Samuel
Spjuth Ola
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

Publikationer från Uppsala Universitet

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Software engineering for scientific big data analysis

Author: Blankenberg Daniel
Grüning Björn A.
Lampa Samuel
Vaudel Marc
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2019
Field of study

The increasing complexity of data and analysis methods has created an environment where scientists, who may not have formal training, are finding themselves playing the impromptu role of software engineer. While several resources are available for introducing scientists to the basics of programming, researchers have been left with little guidance on approaches needed to advance to the next level for the development of robust, large-scale data analysis tools that are amenable to integration into workflow management systems, tools, and frameworks. The integration into such workflow systems necessitates additional requirements on computational tools, such as adherence to standard conventions for robustness, data input, output, logging, and flow control. Here we provide a set of 10 guidelines to steer the creation of command-line computational tools that are usable, reliable, extensible, and in line with standards of modern coding practices

Publikationer från Uppsala Universitet

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Raw datasets for Large Scale SVM Experiment(s)

Author: Jonathan Alvarsson (626832)
Ola Spjuth (2940972)
Samuel Lampa (3119076)
Publication venue
Publication date
Field of study

Raw data for the study titled "Large-scale ligand-based predictive modelling using support vector machines" and available at https://pharmb.io/publication/2016-large-scale-svm and http://dx.doi.org/10.1186/s13321-016-0151-5 Changelog: V2: Add textfiles with checksums (md5 and sha1). V3: Remove old tarball without checksums

FigShare